Homepage Search in Blog Collections

نویسندگان

  • Jangwon Seo
  • W. Bruce Croft
چکیده

A blog homepage consists of many individual blog postings. Current blog search services focus on retrieving postings but there is also a need to identify relevant blog homepages. In this paper, we investigate the properties of blog collections and describe the differences between blog homepage searches and general web page searches. We also introduce and evaluate a variety of approaches for blog homepage search. Our results show that noise reduction and the appropriate combination of techniques can achieve significant improvements in retrieval performance compared to a baseline approach and a traditional named page finding approach for general web pages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Leveraging Collection Structure in Information Retrieval With Applications to Search in Conversational Social Media

Social media collections are becoming increasingly important in the everyday life of Internet users. Recent statistics show that sites hosting social media and community-generated content account for five of the top ten most visited websites in the United States [4], are visited regularly by a broad cross-section of Internet users [61, 67, 115] and host an enormous quantity of information [119,...

متن کامل

What's New at TREC: Blog and Legal Discovery Search at TREC-2006

This past year, the Text REtrieval Conference (TREC) started two new tracks. One was the Blog track – given a large collection of blog posts and their comments, the task was to locate opinions about products, people, organizations, etc. The other new track was the Legal Track. This track seeks to build test collections for searches that occur during the discovery portion of a lawsuit. The Legal...

متن کامل

Math-Net, a model for information and communication systems in sciences

A homepage is the Web entry point and a signpost to a Web site (other common terms therefore are "portals" or "sitemaps"). Web sites of (mathematical) departments consist of collections of interrelated information of the institution. A clear and intuitive structure of the homepage is essential for a user-friendly navigation and search. In fact, however, the structure of department homepages dif...

متن کامل

Search in Conversational Social Media Collections

Community generated content has become increasingly important over the past several years: blogs, Wikipedia, online forums, twitter, Yahoo! Answers, Facebook and many other online communities that foster social interaction have flourished. However, studying “Search in Social Media” as a distinct sub-field of information retrieval poses some questions. Although there is a loose consensus of the ...

متن کامل

University of Glasgow at TREC 2009: Experiments with Terrier

In TREC 2009, we extend our Voting Model for the faceted blog distillation, top stories identification, and related entity finding tasks. Moreover, we experiment with our novel xQuAD framework for search result diversification. Besides fostering our research in multiple directions, by participating in such a wide portfolio of tracks, we further develop the indexing and retrieval capabilities of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007